125 research outputs found
Trajectory-Based Off-Policy Deep Reinforcement Learning
Policy gradient methods are powerful reinforcement learning algorithms and
have been demonstrated to solve many complex tasks. However, these methods are
also data-inefficient, afflicted with high variance gradient estimates, and
frequently get stuck in local optima. This work addresses these weaknesses by
combining recent improvements in the reuse of off-policy data and exploration
in parameter space with deterministic behavioral policies. The resulting
objective is amenable to standard neural network optimization strategies like
stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo.
Incorporation of previous rollouts via importance sampling greatly improves
data-efficiency, whilst stochastic optimization schemes facilitate the escape
from local optima. We evaluate the proposed approach on a series of continuous
control benchmark tasks. The results show that the proposed algorithm is able
to successfully and reliably learn solutions using fewer system interactions
than standard policy gradient methods.Comment: Includes appendix. Accepted for ICML 201
Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers
PID control architectures are widely used in industrial applications. Despite
their low number of open parameters, tuning multiple, coupled PID controllers
can become tedious in practice. In this paper, we extend PILCO, a model-based
policy search framework, to automatically tune multivariate PID controllers
purely based on data observed on an otherwise unknown system. The system's
state is extended appropriately to frame the PID policy as a static state
feedback policy. This renders PID tuning possible as the solution of a finite
horizon optimal control problem without further a priori knowledge. The
framework is applied to the task of balancing an inverted pendulum on a seven
degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast
and data-efficient policy learning, even on complex real world problems.Comment: Accepted final version to appear in 2017 IEEE International
Conference on Robotics and Automation (ICRA
Policy search for imitation learning
Efficient motion planning and possibilities for non-experts to teach new motion primitives are key components for a new generation of robotic systems. In order to be applicable beyond the well-defined context of laboratories and the fixed settings of industrial factories, those machines have to be easily programmable, adapt to dynamic environments and learn and acquire new skills autonomously. Reinforcement learning in principle solves those learning issues but suffers from the curse of dimensionality. When dealing with complex environments and highly agile hardware platforms like humanoid robots in large or possibly continuous state and action spaces, the reinforcement framework becomes computationally infeasible. In recent publications, parametrized policies have been employed to face this problem. One of them, Policy Improvement with Path Integrals (PI^2), has been derived from the transformation of the Hamilton-Jacobi-Bellman (HJB) equation of stochastic optimal control into a path integral using the Feynmann Kac theorem. Applications of PI^2 are so far limited to Dynamic Movement Primitives (DMP) to parametrize the motion policy. Another policy parametrization, the formulation of motion primitives as solution of an optimization-based planner has been widely used in other fields (e.g. inverse optimal control) and offers compelling possibilities to formulate characteristic parts of a motion in an abstract sense without specifying too much problem-specific geometry. Imitation learning or learning from demonstration can be seen as a way to bootstrap the acquisition of new behavior and as an efficient way to guide the policy search into a desired direction. Nevertheless, due to imperfect demonstrations, which might be incomplete or contradictory and also due to noise, the learned behavior might be insufficient. As observed in the animal kingdom, a final trial-and-error phase guided by the cost and reward of a specific behavior is necessary to obtain a successful behavior. Interestingly, the reinforcement learning framework might offer the tools to govern both learning methods at the same time. Imitation learning can be reformulated as reinforcement learning under a specific reward function, allowing the combination of both learning methods. In this work, the concept of probability-weighted averaging of policy roll-outs as seen in PI^2 is combined with an optimization-based policy representation. The reinforcement learning toolbox and direct policy search is utilized in a way that allows both imitation learning based on arbitrary demonstration types and the imposition of additional objectives on the learned behavior. A black box evolutionary algorithm, Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES), which can be shown to be closely related to the approach in PI2 is leveraged to explore the parameter space. This work will experimentally evaluate the suitability of this algorithm for learning motion behavior on a humanoid upper body robotic system. We will focus on learning from different types of demonstrations. The formulation of the reward function for reinforcement learning will be depicted and multiple test scenarios in 2D and 3D will be presented. Finally, the capability of this approach to learn and improve motion primitives is demonstrated on a real robotic system within an obstacle test scenario
Probabilistic Recurrent State-Space Models
State-space models (SSMs) are a highly expressive model class for learning
patterns in time series data and for system identification. Deterministic
versions of SSMs (e.g. LSTMs) proved extremely successful in modeling complex
time series data. Fully probabilistic SSMs, however, are often found hard to
train, even for smaller problems. To overcome this limitation, we propose a
novel model formulation and a scalable training algorithm based on doubly
stochastic variational inference and Gaussian processes. In contrast to
existing work, the proposed variational approximation allows one to fully
capture the latent state temporal correlations. These correlations are the key
to robust training. The effectiveness of the proposed PR-SSM is evaluated on a
set of real-world benchmark datasets in comparison to state-of-the-art
probabilistic model learning methods. Scalability and robustness are
demonstrated on a high dimensional problem
Разработка информационной модели данных системы поощрений сотрудников и студентов
В статье рассмотрена система поощрений сотрудников и студентов. Составлена диаграмма сущность-связь процесса учета всех этапов документооборота данного процесса. Представлен пример формы разработанной информационной системы учета и анализа распределения поощрений сотрудниками студентам.The article considers the system of incentives for employees and students. A diagram is drawn of the essence-relationship of the process of accounting for all stages of the workflow of this process. An example of the form of the developed information system of the account and the analysis of distribution of encouragements by employees to students is presented
A Temporary Pause in the Replication Licensing Restriction Leads to Rereplication during Early Human Cell Differentiation
Gene amplifications in amphibians and flies are known to occur during development and
have been well characterized, unlike in mammalian cells, where they are predominantly investigated
as an attribute of tumors. Recently, we first described gene amplifications in human and mouse neural
stem cells, myoblasts, and mesenchymal stem cells during differentiation. The mechanism leading
to gene amplifications in amphibians and flies depends on endocycles and multiple origin-firings.
So far, there is no knowledge about a comparable mechanism in normal human cells. Here, we
describe rereplication during the early myotube differentiation of human skeletal myoblast cells,
using fiber combing and pulse-treatment with EdU (50
-Ethynyl-20
-deoxyuridine)/CldU (5-Chlor-20
-
deoxyuridine) and IdU (5-Iodo-20
-deoxyuridine)/CldU. We found rereplication during a restricted
time window between 2 h and 8 h after differentiation induction. Rereplication was detected in cells
simultaneously with the amplification of the MDM2 gene. Our findings support rereplication as a
mechanism enabling gene amplification in normal human cells
Design and implementation of a platform for hyperconnected cyber physical systems
International audienceThe Internet of Things (IoT) is an area of growing importance as more and more computing capability becomes embedded into real world objects and environments. But at the same time IoT is just one component of a widespread shift towards a new age of federation, combining with other trends such as cloud computing, blockchain and automation to create a new hyperconnected infrastructure. This infrastructure will emerge from the convergence of traditional, cloud and IoT-based models of computing, creating a more decentralised, secure and democratic computing platform for the future. But while bringing significant benefits, federation also brings significant problems-in particular the complexity of building, integrating and managing systems built using highly distributed and heterogeneous platforms. In this paper we discuss our work on modelling, deployment and management for this new converged computing environment, leveraging previous work on domain languages, cloud computing and the Web of Things to accelerate and democratize the development of real world hyperconnected systems
Persisting right-sided chylothorax in a patient with chronic lymphocytic leukemia: a case report
Introduction Chylothorax caused by chronic lymphocytic leukemia is very rare and the best therapeutic approach, especially the role of modern immunochemotherapy, is not yet defined. Case presentation We present the case of a 65-year-old male Caucasian patient with right-sided chylothorax caused by a concomitantly diagnosed chronic lymphocytic leukemia. As first-line treatment four cycles of an immunochemotherapy, consisting of fludarabine, cyclophosphamide and rituximab were administered. In addition, our patient received total parenteral nutrition for the first two weeks of treatment. Despite the very good clinical response of the lymphoma to treatment, the chylothorax persisted and percutaneous radiotherapy of the thoracic duct was applied. However, eight weeks after the radiotherapy the chylothorax still persisted and our patient agreed to a surgical intervention. A ligation of the thoracic duct via a muscle sparing thoracotomy was performed, resulting in a complete cessation of the pleural effusion. Apart from the first two weeks our patient was treated on an out-patient basis for nearly six months. Conclusion In this case of chylothorax caused by chronic lymphocytic leukemia, immunochemotherapy in combination with conservative treatment, and even consecutive radiotherapy, were not able to stop pleural effusion, despite the very good clinical response of the chronic lymphocytic leukemia to treatment. Out-patient management using repetitive thoracocenteses can be safe as bridging until definitive surgical ligation of the thoracic duct
Reprogramming Low-end IoT Devices from the Cloud
International audienceThe Internet of Things (IoT) consists in a variety of smart connected objects, among which a category of low-end devices based on micro-controllers. The orchestration of low-end IoT devices is not straightforward because of the lack of generic and holistic solutions articulating cloud-based tools on one hand, and low-end IoT device software on the other hand. In this paper, we describe such a solution, combining a cloud-based IDE, graphical programming, and automatic JavaScript generation. Scripts are pushed over the Internet and over-the-air for the last hop, updating runtime containers hosted on heterogeneous low-end IoT devices running RIOT. We demonstrate a prototype working on common off-the-shelf low-end IoT hardware with as little as 32kB of memory
Rich Magnetic Phase Diagram of Putative Helimagnet SrFeO
The cubic perovskite SrFeO was recently reported to host hedgehog- and
skyrmion-lattice phases in a highly symmetric crystal structure which does not
support the Dzyaloshinskii-Moriya interactions commonly invoked to explain such
magnetic order. Hints of a complex magnetic phase diagram have also recently
been found in powder samples of the single-layer Ruddlesden-Popper analog
SrFeO, so a reinvestigation of the bilayer material SrFeO,
believed to be a simple helimagnet, is called for. Our magnetization and
dilatometry studies reveal a rich magnetic phase diagram with at least 6
distinct magnetically ordered phases and strong similarities to that of
SrFeO. In particular, at least one phase is apparently
multiple-, and the s are not observed to vary among the
phases. Since SrFeO has only two possible orientations for its
propagation vector, some of the phases are likely exotic multiple-
order, and it is possible to fully detwin all phases and more readily access
their exotic physics.Comment: 14 pages, 13 figure
- …